Approaches to Multiple Record Linkage
نویسندگان
چکیده
We review the theory and techniques of record linkage that date back to pioneering work by Fellegi and Sunter on matching records in two lists. When the task involves linking K > 2 lists, the most common approach consists of performing all ( K 2 ) possible pairs of lists using a Fellegi-Sunter-like approach and then somehow reconciling the discrepancies in an ad hoc fashion. We describe some important uses of the methodology, provide a principled way of accomplishing the reconciliation and we finally present some key parts of the generalization of Fellegi and Sunter’s method to K > 2 lists.
منابع مشابه
Multiple Instance Learning for Group Record Linkage
Record linkage is the process of identifying records that refer to the same entities from different data sources. While most research efforts are concerned with linking individual records, new approaches have recently been proposed to link groups of records across databases. Group record linkage aims to determine if two groups of records in two databases refer to the same entity or not. One app...
متن کاملAssessing and Mitigating Disclosure Risk with Multiple Record Linkage
This study examines privacy disclosure risks when multiple records in a dataset are associated with the same individual. Existing data privacy approaches typically assume that each individual in a dataset corresponds to a single record, which tends to underestimate the disclosure risks in the multiple-record problems. We propose a novel privacy approach, which uses a measure called g-balance to...
متن کاملProbabilistic Linkage of Persian Record with Missing Data
Extended Abstract. When the comprehensive information about a topic is scattered among two or more data sets, using only one of those data sets would lead to information loss available in other data sets. Hence, it is necessary to integrate scattered information to a comprehensive unique data set. On the other hand, sometimes we are interested in recognition of duplications in a data set. The i...
متن کاملScalable Multi-Database Privacy-Preserving Record Linkage using Counting Bloom Filters
Privacy-preserving record linkage (PPRL) aims at integrating sensitive information from multiple disparate databases of different organizations. PPRL approaches are increasingly required in real-world application areas such as healthcare, national security, and business. Previous approaches have mostly focused on linking only two databases as well as the use of a dedicated linkage unit. Scaling...
متن کاملDistance-based and probabilistic record linkage for re-identification of records with categorical variables
Record linkage methods are methods for identifying the presence of the same individual in different data files (re-identification). This paper studies and compares the two main existing approaches for record linkage: probabilistic and distance-based. The performance of both approaches is compared when data are categorical. To that end, a distance over ordinal and nominal scales is defined. The ...
متن کامل